{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KimMZUVqcJ8_"
      },
      "source": [
        "##### Copyright 2021 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "BRQ6HQ8zcV5v"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BlWzg1D9_EhW"
      },
      "source": [
        "# Inspecting Quantization Errors with Quantization Debugger"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XLoHL19yb-a0"
      },
      "source": [
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/lite/performance/quantization_debugger\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/performance/quantization_debugger.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/performance/quantization_debugger.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/tensorflow/tensorflow/lite/g3doc/performance/quantization_debugger.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://tfhub.dev/google/imagenet/mobilenet_v3_small_100_224/classification/5\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/hub_logo_32px.png\" /\u003eSee TF Hub model\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MWO_yYDGcGWY"
      },
      "source": [
        "Although full-integer quantization provides improved model size and latency, the\n",
        "quantized model won't always work as expected. It's usually expected for the\n",
        "model quality (e.g. accuracy, mAP, WER) to be slightly lower than the original\n",
        "float model. However, there are cases where the model quality can go below your\n",
        "expectation or generated completely wrong results.\n",
        "\n",
        "When this problem happens, it's tricky and painful to spot the root cause of the\n",
        "quantization error, and it's even more difficult to fix the quantization error.\n",
        "To assist this model inspection process, **quantization debugger** can be used\n",
        "to identify problematic layers, and **selective quantization** can leave those\n",
        "problematic layers in float so that the model accuracy can be recovered at the\n",
        "cost of reduced benefit from quantization.\n",
        "\n",
        "Note: This API is experimental, and there might be breaking changes in the API\n",
        "in the course of improvements."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9kD29R1I_Mn6"
      },
      "source": [
        "## Quantization Debugger\n",
        "\n",
        "Quantization debugger makes it possible to do quantization quality metric\n",
        "analysis in the existing model. Quantization debugger can automate processes for\n",
        "running model with a debug dataset, and collecting quantization quality metrics\n",
        "for each tensors.\n",
        "\n",
        "Note: Quantization debugger and selective quantization currently only works for\n",
        "full-integer quantization with int8 activations."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "221Qon7G_PmZ"
      },
      "source": [
        "### Prerequisites\n",
        "\n",
        "If you already have a pipeline to quantize a model, you have all necessary\n",
        "pieces to run quantization debugger!\n",
        "\n",
        "*   Model to quantize\n",
        "*   Representative dataset\n",
        "\n",
        "In addition to model and data, you will need to use a data processing framework\n",
        "(e.g. pandas, Google Sheets) to analyze the exported results."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qTEEzJWo_iZ_"
      },
      "source": [
        "### Setup\n",
        "\n",
        "This section prepares libraries, MobileNet v3 model, and test dataset of 100\n",
        "images."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "l7epUDUP_6qo"
      },
      "outputs": [],
      "source": [
        "# Quantization debugger is available from TensorFlow 2.7.0\n",
        "!pip uninstall -y tensorflow\n",
        "!pip install tf-nightly\n",
        "!pip install tensorflow_datasets --upgrade  # imagenet_v2 needs latest checksum"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "LLsgiUZe_hIa"
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "import tensorflow as tf\n",
        "import tensorflow_datasets as tfds\n",
        "import tensorflow_hub as hub"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "veWjO3u32vzz"
      },
      "outputs": [],
      "source": [
        "#@title Boilerplates and helpers\n",
        "MODEL_URI = 'https://tfhub.dev/google/imagenet/mobilenet_v3_small_100_224/classification/5'\n",
        "\n",
        "\n",
        "def process_image(data):\n",
        "  data['image'] = tf.image.resize(data['image'], (224, 224)) / 255.0\n",
        "  return data\n",
        "\n",
        "\n",
        "# Representative dataset\n",
        "def representative_dataset(dataset):\n",
        "\n",
        "  def _data_gen():\n",
        "    for data in dataset.batch(1):\n",
        "      yield [data['image']]\n",
        "\n",
        "  return _data_gen\n",
        "\n",
        "\n",
        "def eval_tflite(tflite_model, dataset):\n",
        "  \"\"\"Evaluates tensorflow lite classification model with the given dataset.\"\"\"\n",
        "  interpreter = tf.lite.Interpreter(model_content=tflite_model)\n",
        "  interpreter.allocate_tensors()\n",
        "\n",
        "  input_idx = interpreter.get_input_details()[0]['index']\n",
        "  output_idx = interpreter.get_output_details()[0]['index']\n",
        "\n",
        "  results = []\n",
        "\n",
        "  for data in representative_dataset(dataset)():\n",
        "    interpreter.set_tensor(input_idx, data[0])\n",
        "    interpreter.invoke()\n",
        "    results.append(interpreter.get_tensor(output_idx).flatten())\n",
        "\n",
        "  results = np.array(results)\n",
        "  gt_labels = np.array(list(dataset.map(lambda data: data['label'] + 1)))\n",
        "  accuracy = (\n",
        "      np.sum(np.argsort(results, axis=1)[:, -5:] == gt_labels.reshape(-1, 1)) /\n",
        "      gt_labels.size)\n",
        "  print(f'Top-5 accuracy (quantized): {accuracy * 100:.2f}%')\n",
        "\n",
        "\n",
        "model = tf.keras.Sequential([\n",
        "  tf.keras.layers.Input(shape=(224, 224, 3), batch_size=1),\n",
        "  hub.KerasLayer(MODEL_URI)\n",
        "])\n",
        "model.compile(\n",
        "    loss='sparse_categorical_crossentropy',\n",
        "    metrics='sparse_top_k_categorical_accuracy')\n",
        "model.build([1, 224, 224, 3])\n",
        "\n",
        "# Prepare dataset with 100 examples\n",
        "ds = tfds.load('imagenet_v2', split='test[:1%]')\n",
        "ds = ds.map(process_image)\n",
        "\n",
        "converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
        "converter.representative_dataset = representative_dataset(ds)\n",
        "converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
        "quantized_model = converter.convert()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "7mX-R-xK4ADB"
      },
      "outputs": [],
      "source": [
        "test_ds = ds.map(lambda data: (data['image'], data['label'] + 1)).batch(16)\n",
        "loss, acc = model.evaluate(test_ds)\n",
        "print(f'Top-5 accuracy (float): {acc * 100:.2f}%')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Mnp6yBnJSCoh"
      },
      "outputs": [],
      "source": [
        "eval_tflite(quantized_model, ds)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Tblkk3cxxpuw"
      },
      "source": [
        "We can see that the original model has a much higher top-5 accuracy for our\n",
        "small dataset, while the quantized model has a significant accuracy loss."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dBBcfCQw_Wqd"
      },
      "source": [
        "### Step 1. Debugger preparation\n",
        "\n",
        "Easiest way to use the quantization debugger is to provide\n",
        "`tf.lite.TFLiteConverter` that you have been using to quantize the model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NOByihbD_NZZ"
      },
      "outputs": [],
      "source": [
        "converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
        "converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
        "converter.representative_dataset = representative_dataset(ds)\n",
        "\n",
        "# my_debug_dataset should have the same format as my_representative_dataset\n",
        "debugger = tf.lite.experimental.QuantizationDebugger(\n",
        "    converter=converter, debug_dataset=representative_dataset(ds))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9vR1IIrmQS9W"
      },
      "source": [
        "### Step 2. Running the debugger and getting the results\n",
        "\n",
        "When you call `QuantizationDebugger.run()`, the debugger will log differences\n",
        "between float tensors and quantized tensors for the same op location, and\n",
        "process them with given metrics."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "HsUM54g-_E52"
      },
      "outputs": [],
      "source": [
        "debugger.run()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yQpX_SBUQXvr"
      },
      "source": [
        "The processed metrics can be accessed with\n",
        "`QuantizationDebugger.layer_statistics`, or can be dumped to a text file in CSV\n",
        "format with `QuantizationDebugger.layer_statistics_dump()`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "U-AGYUAbQUmx"
      },
      "outputs": [],
      "source": [
        "RESULTS_FILE = '/tmp/debugger_results.csv'\n",
        "with open(RESULTS_FILE, 'w') as f:\n",
        "  debugger.layer_statistics_dump(f)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "LQzEi6VnQaen"
      },
      "outputs": [],
      "source": [
        "!head /tmp/debugger_results.csv"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4np7VqU-Qfke"
      },
      "source": [
        "For each row in the dump, the op name and index comes first, followed by\n",
        "quantization parameters and error metrics (including\n",
        "[user-defined error metrics](#custom-metrics), if any). The resulting CSV file\n",
        "can be used to pick problematic layers with large quantization error metrics.\n",
        "\n",
        "With pandas or other data processing libraries, we can inspect detailed\n",
        "per-layer error metrics."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XUcSqYFGQb-f"
      },
      "outputs": [],
      "source": [
        "layer_stats = pd.read_csv(RESULTS_FILE)\n",
        "layer_stats.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7C_oHxWFOV6M"
      },
      "source": [
        "### Step 3. Data analysis\n",
        "\n",
        "There are various ways to analyze the resulting. First, let's add some useful\n",
        "metrics derived from the debugger's outputs. (`scale` means the quantization\n",
        "scale factor for each tensor.)\n",
        "\n",
        "*   Range (`256 / scale`)\n",
        "*   RMSE / scale (`sqrt(mean_squared_error) / scale`)\n",
        "\n",
        "The `RMSE / scale` is close to `1 / sqrt(12)` (~ 0.289) when quantized\n",
        "distribution is similar to the original float distribution, indicating a good\n",
        "quantized model. The larger the value is, it's more likely for the layer not\n",
        "being quantized well."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mwviORyJN6e5"
      },
      "outputs": [],
      "source": [
        "layer_stats['range'] = 255.0 * layer_stats['scale']\n",
        "layer_stats['rmse/scale'] = layer_stats.apply(\n",
        "    lambda row: np.sqrt(row['mean_squared_error']) / row['scale'], axis=1)\n",
        "layer_stats[['op_name', 'range', 'rmse/scale']].head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "oAAv35CdPvc4"
      },
      "outputs": [],
      "source": [
        "plt.figure(figsize=(15, 5))\n",
        "ax1 = plt.subplot(121)\n",
        "ax1.bar(np.arange(len(layer_stats)), layer_stats['range'])\n",
        "ax1.set_ylabel('range')\n",
        "ax2 = plt.subplot(122)\n",
        "ax2.bar(np.arange(len(layer_stats)), layer_stats['rmse/scale'])\n",
        "ax2.set_ylabel('rmse/scale')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8pqUQvRUWB3Q"
      },
      "source": [
        "There are many layers with wide ranges, and some layers that have high\n",
        "`RMSE/scale` values. Let's get the layers with high error metrics."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UqFsUX4_Q-cE"
      },
      "outputs": [],
      "source": [
        "layer_stats[layer_stats['rmse/scale'] \u003e 0.7][[\n",
        "    'op_name', 'range', 'rmse/scale', 'tensor_name'\n",
        "]]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DHeALFTGWl_e"
      },
      "source": [
        "With these layers, you can try selective quantization to see if not quantizing\n",
        "those layers improves model quality."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "cvdkjsbwYC6e"
      },
      "outputs": [],
      "source": [
        "suspected_layers = list(\n",
        "    layer_stats[layer_stats['rmse/scale'] \u003e 0.7]['tensor_name'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "W6RQw9JobOTR"
      },
      "source": [
        "In addition to these, skipping quantization for the first few layers also helps\n",
        "improving quantized model's quality."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ikF2bp6NZcXN"
      },
      "outputs": [],
      "source": [
        "suspected_layers.extend(list(layer_stats[:5]['tensor_name']))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1DfT78w6W6Li"
      },
      "source": [
        "## Selective Quantization"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-pubC-01cGEH"
      },
      "source": [
        "Selective quantization skips quantization for some nodes, so that the\n",
        "calculation can happen in the original floating-point domain. When correct\n",
        "layers are skipped, we can expect some model quality recovery at the cost of\n",
        "increased latency and model size.\n",
        "\n",
        "However, if you're planning to run quantized models on integer-only accelerators\n",
        "(e.g. Hexagon DSP, EdgeTPU), selective quantization would cause fragmentation of\n",
        "the model and would result in slower inference latency mainly caused by data\n",
        "transfer cost between CPU and those accelerators. To prevent this, you can\n",
        "consider running\n",
        "[quantization aware training](https://www.tensorflow.org/model_optimization/guide/quantization/training)\n",
        "to keep all the layers in integer while preserving the model accuracy."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EQFBfR7YW-oh"
      },
      "source": [
        "Quantization debugger's option accepts `denylisted_nodes` and `denylisted_ops`\n",
        "options for skipping quantization for specific layers, or all instances of\n",
        "specific ops. Using `suspected_layers` we prepared from the previous step, we\n",
        "can use quantization debugger to get a selectively quantized model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "K5KD0JAEbpsv"
      },
      "outputs": [],
      "source": [
        "debug_options = tf.lite.experimental.QuantizationDebugOptions(\n",
        "    denylisted_nodes=suspected_layers)\n",
        "debugger = tf.lite.experimental.QuantizationDebugger(\n",
        "    converter=converter,\n",
        "    debug_dataset=representative_dataset(ds),\n",
        "    debug_options=debug_options)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pfj9gzv4b7h4"
      },
      "outputs": [],
      "source": [
        "selective_quantized_model = debugger.get_nondebug_quantized_model()\n",
        "eval_tflite(selective_quantized_model, ds)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1RkfMYSHdtZy"
      },
      "source": [
        "The accuracy is still lower compared to the original float model, but we have\n",
        "notable improvement from the whole quantized model by skipping quantization for\n",
        "~10 layers out of 111 layers.\n",
        "\n",
        "You can also try to not quantized all ops in the same class. For example, to\n",
        "skip quantization for all mean ops, you can pass `MEAN` to `denylisted_ops`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ruUoP7SgcLpO"
      },
      "outputs": [],
      "source": [
        "debug_options = tf.lite.experimental.QuantizationDebugOptions(\n",
        "    denylisted_ops=['MEAN'])\n",
        "debugger = tf.lite.experimental.QuantizationDebugger(\n",
        "    converter=converter,\n",
        "    debug_dataset=representative_dataset(ds),\n",
        "    debug_options=debug_options)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "oY6kb5g_cO4H"
      },
      "outputs": [],
      "source": [
        "selective_quantized_model = debugger.get_nondebug_quantized_model()\n",
        "eval_tflite(selective_quantized_model, ds)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xa8488TeAyx-"
      },
      "source": [
        "With these techniques, we are able to improve the quantized MobileNet V3 model\n",
        "accuracy. Next we'll explore advanced techniques to improve the model accuracy\n",
        "even more."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZD75cY9PUb2u"
      },
      "source": [
        "## Advanced usages\n",
        "\n",
        "Whith following features, you can futher customize your debugging pipeline."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aVj9yrQoUfGo"
      },
      "source": [
        "### Custom metrics\n",
        "\n",
        "By default, the quantization debugger emits five metrics for each float-quant\n",
        "difference: tensor size, standard deviation, mean error, max absolute error, and\n",
        "mean squared error. You can add more custom metrics by passing them to options.\n",
        "For each metrics, the result should be a single float value and the resulting\n",
        "metric will be an average of metrics from all examples.\n",
        "\n",
        "*   `layer_debug_metrics`: calculate metric based on diff for each op outputs\n",
        "    from float and quantized op outputs.\n",
        "*   `layer_direct_compare_metrics`: rather than getting diff only, this will\n",
        "    calculate metric based on raw float and quantized tensors, and its\n",
        "    quantization parameters (scale, zero point)\n",
        "*   `model_debug_metrics`: **only used when `float_model_(path|content)` is\n",
        "    passed** to the debugger. In addition to the op-level metrics, final layer\n",
        "    output is compared to the reference output from the original float model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "WqmRQSxoVVwu"
      },
      "outputs": [],
      "source": [
        "debug_options = tf.lite.experimental.QuantizationDebugOptions(\n",
        "    layer_debug_metrics={\n",
        "        'mean_abs_error': (lambda diff: np.mean(np.abs(diff)))\n",
        "    },\n",
        "    layer_direct_compare_metrics={\n",
        "        'correlation':\n",
        "            lambda f, q, s, zp: (np.corrcoef(f.flatten(),\n",
        "                                             (q.flatten() - zp) / s)[0, 1])\n",
        "    },\n",
        "    model_debug_metrics={\n",
        "        'argmax_accuracy': (lambda f, q: np.mean(np.argmax(f) == np.argmax(q)))\n",
        "    })\n",
        "\n",
        "debugger = tf.lite.experimental.QuantizationDebugger(\n",
        "    converter=converter,\n",
        "    debug_dataset=representative_dataset(ds),\n",
        "    debug_options=debug_options)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PVQ4nEicXz2l"
      },
      "outputs": [],
      "source": [
        "debugger.run()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "dfKA90csX9UL"
      },
      "outputs": [],
      "source": [
        "CUSTOM_RESULTS_FILE = '/tmp/debugger_results.csv'\n",
        "with open(CUSTOM_RESULTS_FILE, 'w') as f:\n",
        "  debugger.layer_statistics_dump(f)\n",
        "\n",
        "custom_layer_stats = pd.read_csv(CUSTOM_RESULTS_FILE)\n",
        "custom_layer_stats[['op_name', 'mean_abs_error', 'correlation']].tail()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Qqq30oWsZF5b"
      },
      "source": [
        "The result of `model_debug_metrics` can be separately seen from\n",
        "`debugger.model_statistics`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wrXlmzEHYhQ5"
      },
      "outputs": [],
      "source": [
        "debugger.model_statistics"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DqJBLIsoUyIg"
      },
      "source": [
        "### Using (internal) mlir_quantize API to access in-depth features\n",
        "\n",
        "Note: Some features in the folowing section,\n",
        "`TFLiteConverter._experimental_calibrate_only` and `converter.mlir_quantize` are\n",
        "experimental internal APIs, and subject to change in a non-backward compatible\n",
        "way."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VJm66Cz-XpeF"
      },
      "outputs": [],
      "source": [
        "from tensorflow.lite.python import convert"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2krUVzpiUp3u"
      },
      "source": [
        "#### Whole model verify mode\n",
        "\n",
        "The default behavior for the debug model generation is per-layer verify. In this\n",
        "mode, the input for float and quantize op pair is from the same source (previous\n",
        "quantized op). Another mode is whole-model verify, where the float and quantize\n",
        "models are separated. This mode would be useful to observe how the error is\n",
        "being propagated down the model. To enable, `enable_whole_model_verify=True` to\n",
        "`convert.mlir_quantize` while generating the debug model manually."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5zykINDlVLSg"
      },
      "outputs": [],
      "source": [
        "converter = tf.lite.TFLiteConverter.from_keras_model(model)\n",
        "converter.representative_dataset = representative_dataset(ds)\n",
        "converter.optimizations = [tf.lite.Optimize.DEFAULT]\n",
        "converter._experimental_calibrate_only = True\n",
        "calibrated_model = converter.convert()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "eqvXlEiFXfSu"
      },
      "outputs": [],
      "source": [
        "# Note that enable_numeric_verify and enable_whole_model_verify are set.\n",
        "quantized_model = convert.mlir_quantize(\n",
        "    calibrated_model,\n",
        "    enable_numeric_verify=True,\n",
        "    enable_whole_model_verify=True)\n",
        "debugger = tf.lite.experimental.QuantizationDebugger(\n",
        "    quant_debug_model_content=quantized_model,\n",
        "    debug_dataset=representative_dataset(ds))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xQ6TFsXQVHMe"
      },
      "source": [
        "#### Selective quantization from an already calibrated model\n",
        "\n",
        "You can directly call `convert.mlir_quantize` to get the selective quantized\n",
        "model from already calibrated model. This would be particularly useful when you\n",
        "want to calibrate the model once, and experiment with various denylist\n",
        "combinations."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ZCS-Fa9lbdc0"
      },
      "outputs": [],
      "source": [
        "selective_quantized_model = convert.mlir_quantize(\n",
        "    calibrated_model, denylisted_nodes=suspected_layers)\n",
        "eval_tflite(selective_quantized_model, ds)"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [
        "Eq_8T2oauIED"
      ],
      "name": "quantization_debugger.ipynb",
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
